    Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media

    The IberSpeech-RTVE Challenge presented at IberSpeech 2018 is a new Albayzin evaluation series supported by the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla (RTTH)). That series was focused on speech-to-text transcription, speaker diarization, and multimodal diarization of television programs. For this purpose, the Corporacion Radio Television Española (RTVE), the main public service broadcaster in Spain, and the RTVE Chair at the University of Zaragoza made more than 500 h of broadcast content and subtitles available for scientists. The dataset included about 20 programs of different kinds and topics produced and broadcast by RTVE between 2015 and 2018. The programs presented different challenges from the point of view of speech technologies such as: the diversity of Spanish accents, overlapping speech, spontaneous speech, acoustic variability, background noise, or specific vocabulary. This paper describes the database and the evaluation process and summarizes the results obtained

    Tecnologías del habla: nuevas oportunidades para los archivos de televisión

    As the number of audiovisual contents to be identified and analysed has been increasing since the last few years, and as the resources available to face this situation are decreasing, Artificial Intelligence has become a desired tool for television archives. The future automatic metadata extraction workflows will be based in three complementary technologies: artificial vision, speech technologies and natural language processing. The use of these technologies will allow us to access a greater number of contents as well as to increase the granularity in the analysis. The role of the documentalist will be modified once again. Training algorithms and data validation will be two new relevant tasks for these professionals. In this new scenery, in which Artificial Intelligence brings new opportunities to television archives, the RTVE Corporation and the University of Zaragoza, signed an agreement for the Catedra RTVE - Universidad de Zaragoza in July 2017. The main goal of this Chair is to carry out educational and research activities connected to Big Data and its application to the analysis of audiovisual and sound content. In 2018 the Chair promoted the Iberspeech 2018 Challenge. This technological challenge made more than 500 hours of audiovisual content in Spanish available to the scientific community . Iberspeech 2018 also allowed the different national and international research groups to test their algorithms in three different tasks: speech to text, speaker diarization and multimodal diarization. The results obtained have shown the technological difficulties that still have to be overcome. These results should also be addressed from the user’s perspective in order to answer questions connected to the degree of error tolerance in automatic transcription within three different areas: edition, broadcasting and archive

    Artificial intelligence applied to radio news: a case study of automatic segmentation of news items at RNE

    The results of a project on news segmentation at Radio Nacional de España (RNE) carried out by the RTVE Technological Innovation and Media Management areas is presented. The aim of this project is to apply artificial intelligence to automatically transcribe and cut the news items that make up a radio news program. The main goals of this project are to increase the accessibility of the content and to allow its reusability on various platforms and social media. The project was planned in two phases, covering system configuration and service delivery. The minimum quality criteria required were defined in advance, both for automatic voice transcription and for news segmentation. For the speech-to-text process, the highest word error rate (WER) allowed was 10%, while the precision rate for the news segmentation was 85%. System performance in both transcription and segmentation was considered to be sufficient, although a higher degree of accuracy in news cutting is expected in the coming months. The results show that, despite using these quite mature technologies, adjustment and learning processes and human intervention are still necessary

    La memoria colectiva: contenidos para el recuerdo, del archivo a la Web de RTVE

    “La cabina” (1972) de Antonio Mercero, “El Asfalto” (1966) de las “Historias para no dormir” de Narciso Ibáñez Serrador, las aplaudidas representaciones teatrales de Estudio 1 (1963-1983), o la serie “El Quijote” (1991) de Manuel Gutiérrez Aragón, son sólo algunos ejemplos de clásicos televisivos que conforman la memoria colectiva de varias generaciones de españoles. Junto a estos títulos, nuevos programas como “Cachitos de hierro y Cromo”, “Viaje al centro de la tele” o “¿Te acuerdas?” tratan de poner en valor el fondo histórico de TVE. Todos ellos comparten espacio común, el Archivo de RTVE en la web. Formado por miles de horas de emisión y en constante crecimiento desde septiembre de 2008, es el escaparate de un importante trabajo de recuperación de los fondos históricos de TVE en el que trabajan, de forma coordinada, las distintas unidades que integran el Fondo Documental TVE con la colaboración de las áreas de Sistemas, Medios TVE e Interactivos. Se describen los procesos documentales implicados y las herramientas tecnológicas necesarias para trasladar el patrimonio audiovisual de TVE del archivo a la web

    Más allá del mostrador de referencia: del correo electrónico a la Web 2.0

    Descripció automàtica d'arxius audiovisuals : NeuralTalk, un model de video-to-text aplicat a l'arxiu de RTVE

    Objectiu: determinar la maduresa dels sistemes de video-to-text per a la descripció automàtica d'imatges en un arxiu de televisió. -- Metodologia: es fa una prova de concepte mitjançant un sistema de video-to-text desenvolupat ad hoc. La prova es va articular en tres fases o iteracions diferents entre juny de 2016 i gener de 2017. En les dues primeres iteracions el sistema va analitzar un nombre determinat de continguts procedents de l'arxiu de RTVE, les descripcions es van valorar per establir la taxa d'encert del sistema o, en altres paraules, com de propera era aquesta descripció a la que podia haver subministrat un ésser humà. En una tercera fase, i prèviament a l'anàlisi dels continguts, es va entrenar el sistema utilitzant tècniques d'aprenentatge profund amb l'objectiu de millorar els resultats. -- Resultats: els resultats obtinguts posen de manifest que es tracta d'una tecnologia prometedora, si bé resulta fonamental aprofundir més en els mecanismes que serien necessaris per a la seva posada en producció en els arxius de televisió. Objective: To assess the deep learning capability of a video captioning model for automated image description in a television archive. -- Methodology: Our proof of concept tested an ad hoc video-captioning model in three iterations between June 2016 and January 2017. In the first and second iterations the model was used to analyse a selection of content from the archives of the Spanish Radio and Television Corporation (RTVE) and the descriptions it generated were evaluated to determine the model’s success rate, i.e., how close it came to providing human-like image descriptions. In the third iteration and before the content was analysed, the model was trained using deep learning techniques to optimise the results. -- Results: The results indicate that the model has potential, although further development will be required to customise its use in television archives

    Descripción automática de archivos audiovisuales : NeuralTalk, un modelo de video2text aplicado al archivo de RTVE

    Objetivo: determinar la madurez de los sistemas de video-to-text para la descripción automática de imágenes en un archivo de televisión. -- Metodología: se realiza una prueba de concepto mediante un sistema de video-to-text desarrollado ad hoc. La prueba se articuló en tres fases o iteraciones distintas entre junio de 2016 y enero de 2017. En las dos primeras iteraciones el sistema analizó un número determinado de contenidos procedentes del archivo de RTVE, las descripciones se valoraron para establecer la tasa de acierto del sistema o, en otras palabras, cómo de cercana era dicha descripción a la que podía haber suministrado un ser humano. En una tercera fase, y previamente al análisis de los contenidos, se entrenó al sistema utilizando técnicas de aprendizaje profundo con el objetivo de mejorar los resultados. -- Resultados: los resultados obtenidos ponen de manifiesto que se trata de una tecnología prometedora, si bien resulta fundamental profundizar más en los mecanismos que serían necesarios para su puesta en producción en los archivos de televisión. Objective: To assess the deep learning capability of a video captioning model for automated image description in a television archive. -- Methodology: Our proof of concept tested an ad hoc video-captioning model in three iterations between June 2016 and January 2017. In the first and second iterations the model was used to analyse a selection of content from the archives of the Spanish Radio and Television Corporation (RTVE) and the descriptions it generated were evaluated to determine the model’s success rate, i.e., how close it came to providing human-like image descriptions. In the third iteration and before the content was analysed, the model was trained using deep learning techniques to optimise the results. -- Results: The results indicate that the model has potential, although further development will be required to customise its use in television archives